Automatic Generation of Natural Language Summaries
نویسنده
چکیده
Automatic text summarization has gained much interest in the last few years, since it could, at least in principle, make the process of information seeking in large document collections less tedious and time-consuming. Most existing summarization methods generate summaries by initially extracting the sentences that are most relevant to the user’s query from documents returned by an information retrieval engine. In this thesis, we present a new competitive sentence extraction method that assigns relevance scores to the sentences of the texts to be summarized. Coupled with a simple method to avoid selecting redundant sentences, the resulting summarization system achieves state-of-the-art results on widely used benchmark datasets. Moreover, we propose two novel sentence compression methods, which rewrite a source sentence in a shorter form, retaining the most important information. The first method produces extractive compressions, i.e., it only deletes words, whereas the second one produces abstractive compressions, i.e., it also uses paraphrasing. Experiments show that the extractive method generates compressions better or comparable, in terms of grammaticality and meaning preservation, to those produced by state-of-theart systems. On the other hand, the abstractive method produces more varied (due to paraphrasing) and slightly shorter compressions than the extractive one. In terms of grammaticality and meaning preservation, the two methods have similar scores. Finally, we propose an optimization model that generates summaries by jointly se-
منابع مشابه
Applying Natural Language Generation to Indicative Summarization
The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a generation perspective, by first analyzing its required content via published guidelines and corpus analysis. We show how these summaries can be factored into a set of document features, and how an implement...
متن کاملAn Efficient Text Summarizer using Lexical Chains
We present a system which uses lexical chains as an intermediate representation for automatic text summarization. This system builds on previous research by implementing a lexical chain extraction algorithm in linear time. The system is reasonably domain independent and takes as input any text or HTML document. The system outputs a short summary based on the most salient concepts from the origi...
متن کاملGénération de résumés par abstraction complète
This Ph.D. thesis is the result of several years of research on automatic text summarization. Three major contributions are presented in the form of published and yet to be published papers. They follow a path that moves away from extractive summarization and toward abstractive summarization. The first article describes the HexTac experiment, which was conducted to evaluate the performance of h...
متن کاملA Hybrid Approach to Multi-document Summarization of Opinions in Reviews
We present a hybrid method to generate summaries of product and services reviews by combining natural language generation and salient sentence selection techniques. Our system, STARLET-H, receives as input textual reviews with associated rated topics, and produces as output a natural language document summarizing the opinions expressed in the reviews. STARLET-H operates as a hybrid abstractive/...
متن کاملFrom Extractive to Abstractive Meeting Summaries: Can It Be Done by Sentence Compression?
Most previous studies on meeting summarization have focused on extractive summarization. In this paper, we investigate if we can apply sentence compression to extractive summaries to generate abstractive summaries. We use different compression algorithms, including integer linear programming with an additional step of filler phrase detection, a noisychannel approach using Markovization formulat...
متن کاملGenerating Natural Language Summaries for Multimedia
In this paper we introduce an automatic system that generates textual summaries of Internet-style video clips by first identifying suitable high-level descriptive features that have been detected in the video (e.g. visual concepts, recognized speech, actions, objects, persons, etc.). Then a natural language generator is constructed using SimpleNLG to compile the high-level features into a textu...
متن کامل